Improving the Decision Value of Hierarchical Text Clustering Using Term Overlap Detection

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Method for Duplicate Detection Using Hierarchical Clustering of Records

Accuracy and validity of data are prerequisites of appropriate operations of any software system. Always there is possibility of occurring errors in data due to human and system faults. One of these errors is existence of duplicate records in data sources. Duplicate records refer to the same real world entity. There must be one of them in a data source, but for some reasons like aggregation of ...

متن کامل

Refactorings Detection Using Hierarchical Clustering

Refactoring is a process that helps to maintain the internal software quality, during the whole software lifecycle. This paper aims at introducing a new hierarchical clustering algorithm that can be used for improving software systems design, by identifying the appropriate refactorings. The algorithm is named HARD (Hierarchical Clustering Algorithm for Refactorings Determination) and uses a new...

متن کامل

Improving Text Search Process using Text Document Clustering Approach

Knowledge discovery and data mining is a process of retrieving the meaningful knowledge from the raw data, using different techniques. Therefore, text mining is a sub domain of knowledge discovery from the text data. This paper provides a different way of understanding the text mining and their applications in different real time applications. This paper also includes the design of a hybrid tex...

متن کامل

Improving the accuracy of co-citation clustering using full text

Historically, co-citation models have been based only on bibliographic information. Full text analysis offers the opportunity to significantly improve the quality of the signals upon which these co-citation models are based. In this work we study the effect of reference proximity on the accuracy of co-citation clusters. Using a corpus of 270,521 full text documents from 2007, we compare the res...

متن کامل

Hierarchical clustering of large text datasets using Locality-Sensitive Hashing

In this paper, we present a hierarchical clustering algorithm of the large text datasets using Locality-Sensitive Hashing (LSH). The main idea of the LSH is to “hash” items several times, in such a way that similar items are more likely to be hashed to the same bucket than dissimilar are. The main drawback of the conventional hierarchical algorithms is a large time complexity (e.g. Single Linka...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Australasian Journal of Information Systems

سال: 2015

ISSN: 1449-8618,1449-8618

DOI: 10.3127/ajis.v19i0.1180